Implementing state-of-the-art models for the task of text classification looks like a daunting task, requiring vast amounts of computation power and mathematical rigor. However, the widely used transformers library in Pytorch for training BERT and BERT-like transformers-based models released a pipelines API feature, which allows anyone to create a working model with cutting-edge performance.
Launched in December 2019, the pipeline API allows developers and data scientists alike to use competitive models for various downstream tasks including question answering, sentiment analysis and named entity recognition. These models include ready-to-use, already fine tuned models that perform well for the task.
In this blog post, we explore three applications of the pipeline API: sentiment analysis, question answering, and named entity recognition.
For the task of getting the sentiment of any sentence or document, we use the ‘sentiment-analysis’ pipeline, which enables us to use the recently released DistilBert model fine tuned on SST-2 dataset (a benchmark dataset for sentiment analysis).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from transformers
import pipeline
# Initialise Sentiment Analysis Pipeline(Bert large finetuned on SQuAD 1.0)
nlp = pipeline('sentiment-analysis')
# Statement to be predicted the sentiment of
statement = "The weather is not that good, and everything else looks pretty chill, though."
# Pipeline of 'sentiment-analysis'
takes a string as an input
output = nlp(statement)
# Printing the Result
print(output)
The given code outputs a dictionary with the following keys:-
‘label’: The sentiment of the statement represented by its respective label.
‘score’: The confidence with which the model has predicted the given label.
For the task of answering a question given some context, using the following code enables us to use a Google BERT(2018) variant: Bert Large whole-word version fine tuned on SQuAD 1.0.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
from transformers import pipeline # Question - answering pipeline(Bert large finetuned on SQuAD 1.0) nlp = pipeline('question-answering') # Question to be asked to the pipeline question = "Where is New Delhi?" # Context, with respect to which the question is to be asked context = "New Delhi is the capital of India." # Passing the Question and Context into the QA pipeline output = nlp(question = question, context = context) # Printing the Output print(output)
The given code outputs a dictionary with the following keys:
‘answer’: The answer to the question as a string.
‘score’: The confidence with which the model has predicted the given answer.
‘start’: Start position of the answer in the context.
‘end’: End position of the answer in the context.
For the task of recognizing named entities within a given document, we use the Facebook AI research model XLM-R, fine tuned on the CoNLL2003 dataset, which achieves state-of-the-art performance represented by an F1 of 88.7(Reference).
1 2 3 4 5 6 7 8 9 10 11 12 13
from transformers import pipeline # Named Entity Recognition Pipeline(XLM - R finetuned by @stefan - it on CoNLL03 English) nlp = pipeline('ner', model = 'xlm-roberta-large-finetuned-conll03-english') # Statement in which named entities are to be found statement = "Topcoder makes me happy." # Passing the statement to the NER pipeline output = nlp(statement) # Printing the output print(output)
The given code outputs a list of named entities in the following format:
Word: The named entity, as a string.
Score: The confidence with which the model has predicted the given named entity.
Entity: The type of named entity. For example, ‘I-PER representing person named entity and I-LOC’ representing location named entity.
This article shows that models competitive to state of the art can be utilized efficiently using the Pytorch-based transformers library, which not only helps in the creation of prototype models for various NLP tasks but also implements them at industry scale with ease.